Interrater Agreement and Combining Ratings

نویسنده

  • Richard S. Bogartz
چکیده

Some behaviors such as smiles require human raters for their measurement. A model of the rating process is explored that assumes that the probability distribution of overt rating responses depends on which of several underlying or latent responses occurred. The ideal of theoretically identical raters is considered and also departures from such identity. Methods for parameter estimation and assessing goodness of fit of the model are presented. A test of the hypothesis of identical raters is provided. Simulated data are used to explore different measures of agreement, optimal numbers of raters, how the ratings from multiple raters should be used to arrive at a final score for subsequent analysis, and the consequences of departures from the basic assumptions of identical raters and constant underlying response probabilities. The results indicate that often using two or three raters to rate all of the data, assessing the quality of their ratings by assessing their pairwise correlations, and using their average rating rather than the ratings of a single primary rater will provide improved measurement of the underlying response. Simulations show that reduced rater sensitivity to response differences can be compensated for by using more raters. Methods are considered for using the model to select raters and to gauge which aspects of the rating process need additional training. Suggestions are offered for using the model in pilot rating of pilot data for deciding on the number of raters and the number of values on the rating scale.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Interrater Agreement of Process Capability Ratings

The reliability of process assessments has received some study in the recent past, much of it being conducted within the context of the SPICE trials. In this paper we build upon this work by evaluating the reliability of ratings on each of the practices that make up the SPICE capability dimension. The type of reliability that we evaluate is interrater agreement: the agreement amongst independen...

متن کامل

The intra- and interrater reliability of the action research arm test: a practical test of upper extremity function in patients with stroke.

OBJECTIVES To determine the intra- and interrater reliability of the Action Research Arm (ARA) test, to assess its ability to detect a minimal clinically important difference (MCID) of 5.7 points, and to identify less reliable test items. DESIGN Intrarater reliability of the sum scores and of individual items was assessed by comparing (1) the ratings of the laboratory measurements of 20 patie...

متن کامل

Cost Implications of Interrater Agreement for Software Process Assessments

Much empirical research has been done recently on evaluating and modeling interrater agreement in software process assessments. Interrater agreement is the extent to which assessors agree in their ratings of software process capabilities when presented with the same evidence and performing their ratings independently. This line of research was based on the premise that lack of interrater agreem...

متن کامل

Expert consensus ratings of job categories from the Third National Health and Nutrition Examination Survey (NHANES III).

BACKGROUND A method of occupational physical exposure assessment is needed to improve analyses using large data sets (e.g., national surveys) that provide only job title/category information as a proxy for exposure. METHODS Five ergonomic experts rated and arrived at consensus ratings for job categories used in the Third National Health and Nutrition Examination Survey. Interrater agreement w...

متن کامل

Interrater Reliability and Agreement of Subjective Judgments

Indexes of interrater reliability and agreement are reviewed and suggestions are made regarding their use in counseling psychology research. The distinction between agreement and reliability is clarified and the relationships between these indexes and the level of measurement and type of replication are discussed. Indexes of interrater reliability appropriate for use with ordinal and interval s...

متن کامل

Beyond kappa: A review of interrater agreement measures

In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches to the study of interrater agreement, for which the relevant data comprise either nominal or ordin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005